Sentence Compression as a Component of a Multi-Document Summarization System
نویسندگان
چکیده
We applied a single-document sentencetrimming approach (Trimmer) to the problem of multi-document summarization. Trimmer was designed with the intention of compressing a lead sentence into a space consisting of tens of characters. In our Multi-Document Trimmer (MDT), we use Trimmer to generate multiple trimmed candidates for each sentence. Sentence selection is used to determine which trimmed candidates provide the best combination of topic coverage and brevity. We demonstrate that we were able to port Trimmer easily to this new problem. We also show that MDT generally ranked higher for recall than for precision, suggesting that MDT is currently more successful at finding relevant content than it is at weeding out irrelevant content. Finally, we present an error analysis that shows that, while sentence compressions is making space for additional sentences, more work is needed in the area of generating and selecting the right candidates.
منابع مشابه
Multi-candidate reduction: Sentence compression as a tool for document summarization tasks
This article examines the application of two single-document sentence compression techniques to the problem of multi-document summarization—a “parse-and-trim” approach and a statistical noisy-channel approach. We introduce the Multi-Candidate Reduction (MCR) framework for multi-document summarization, in which many compressed candidates are generated for each source sentence. These candidates a...
متن کاملSIMBA: An Extractive Multi-document Summarization System for Portuguese
This is a proposal for demonstration of simba in PROPOR 2012. simba is an extractive multi-document summarization system that aims at producing generic summaries guided by a compression rate defined by the user. It uses a double-clustering approach to find the relevant information in a set of texts. In addition, simba uses a sentence simplification procedure as a mean to ensure summary compress...
متن کاملImproving summarization performance by sentence compression: a pilot study
In this paper we study the effectiveness of applying sentence compression on an extraction based multi-document summarization system. Our results show that pure syntactic-based compression does not improve system performance. Topic signature-based reranking of compressed sentences does not help much either. However reranking using an oracle showed a significant improvement remains possible.
متن کاملGenerating Summaries Using Sentence Compression and Statistical Measures
In this paper, we propose a compression based multi-document summarization technique by incorporating word bigram probability and word co-occurrence measure. First we implemented a graph based technique to achieve sentence compression and information fusion. In the second step, we use hand-crafted rule based syntactic constraint to prune our compressed sentences. Finally we use probabilistic me...
متن کاملOn the Effectiveness of using Sentence Compression Models for Query-Focused Multi-Document Summarization
This paper applies sentence compression models for the task of query-focused multi-document summarization in order to investigate if sentence compression improves the overall summarization performance. Both compression and summarization are considered as global optimization problems and solved using integer linear programming (ILP). Three different models are built depending on the order in whi...
متن کامل